bugprone-narrowing-conversions by Jacobfaib · Pull Request #9505 · NVIDIA/cccl

Jacobfaib · 2026-06-17T17:36:40Z

Description

closes

Checklist

New or existing tests cover these changes.
The documentation is up to date with these changes.

coderabbitai

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)

cub/cub/detail/warpspeed/allocators/smem_allocator.cuh (1)

53-60: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

important: Handle align == 0 before round-up math in alloc.
With the current formula on Line 56, align defaulting to 0 makes (align - 1) underflow and can force ptrAllocation32 to 0, which breaks allocation monotonicity. The cast on Line 59 then hides this by folding a wrapped unsigned delta into int. Guard zero alignment (e.g., normalize to 1 or assert/reject) before computing ptrAllocation32.

libcudacxx/benchmarks/bench/find_if/basic.cu (1)

27-33: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

suggestion: mismatch_point can be elements when MismatchAt is 1.0, but state.add_global_memory_reads<T>(mismatch_point + 1) then records elements + 1 reads. Cap the reported read count to elements to avoid skewed benchmark metadata for that axis value.

thrust/benchmarks/bench/none_of/basic.cu (1)

21-27: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

suggestion: for MismatchAt == 1.0, mismatch_point becomes elements, so state.add_global_memory_reads<T>(mismatch_point + 1) overreports by one. Please clamp the accounting value to elements to keep benchmark stats accurate.

cub/test/catch2_test_device_reduce_env_api.cu (1)

418-446: ⚠️ Potential issue | 🟠 Major

important: Line 443 validates device results after launching work on stream_ref, but this block does not synchronize that stream first. Unlike the other stream tests in this file, this can race outstanding work and make the assertion nondeterministic. Add stream.sync(); before the REQUIRE checks.

🧹 Nitpick comments (11)

thrust/thrust/detail/complex/cexpf.h (1)

61-61: ⚡ Quick win

important: Line 61 still evaluates the exponent expression in uint32_t before casting, so underflow can occur in unsigned space and then convert to int with implementation-defined semantics. Cast to signed before arithmetic (e.g., const int exp_bits = static_cast<int>(hx >> 23); *expt = exp_bits - (0x7f + 127) + static_cast<int>(k);).

cudax/benchmarks/bench/copy/copy_bench.cu (1)

25-25: important: this benchmark .cu change should include before/after performance evidence (and SASS-diff check if codegen changed), otherwise we can’t confirm it is regression-free.

As per coding guidelines, “Do not commit SASS code changes without running benchmarks to check for performance regressions,” and benchmark changes should follow docs/cub/benchmarking.rst.

Source: Coding guidelines

libcudacxx/benchmarks/bench/all_of/basic.cu (1)

27-27: important: please attach benchmark comparison output for this arithmetic/codegen-touching benchmark change (base vs new), so we can confirm no performance regression.

As per coding guidelines, benchmark updates should follow docs/cub/benchmarking.rst, and .cu SASS-affecting changes require benchmark validation.

Source: Coding guidelines

libcudacxx/benchmarks/bench/rotate_copy/basic.cu (1)

24-24: important: this benchmark arithmetic change should include base/new benchmark results to validate no throughput regression in rotate_copy.

As per coding guidelines, benchmark changes should follow docs/cub/benchmarking.rst, and .cu changes that can alter generated code should be benchmark-validated.

Source: Coding guidelines

thrust/benchmarks/bench/any_of/basic.cu (1)

21-21: important: please provide before/after any_of benchmark output for this change to confirm no regression in measured performance.

As per coding guidelines, benchmark updates should follow docs/cub/benchmarking.rst, and .cu changes with potential codegen impact should be validated with benchmarks.

Source: Coding guidelines

c/experimental/stf/test/test_cuda_kernel.cu (1)

20-21: ⚡ Quick win

suggestion: declare tid and nthreads as const int since they are not reassigned; this matches the repository rule that non-mutating locals should be const.

As per coding guidelines: “All non-const variables must use const.”

Source: Coding guidelines

c/experimental/stf/test/test_host_launch.cu (1)

20-21: ⚡ Quick win

suggestion: make tid and nthreads const int; both are immutable after initialization and should follow the project’s const-local requirement.

As per coding guidelines: “All non-const variables must use const.”

Source: Coding guidelines

c/experimental/stf/test/test_logical_data_with_place.cu (1)

26-26: ⚡ Quick win

suggestion: change int i to const int i here; the value is not mutated and the repo style requires const locals when possible.

As per coding guidelines: “All non-const variables must use const.”

Source: Coding guidelines

thrust/examples/sum_columns.cu (1)

43-43: ⚡ Quick win

suggestion: add <cstddef> for std::ptrdiff_t and make this local const auto since it is not reassigned. As per coding guidelines, files must include headers for symbols they use (no transitive reliance), and non-const variables should be const.

Source: Coding guidelines

thrust/examples/sum_rows.cu (1)

43-43: ⚡ Quick win

suggestion: include <cstddef> for std::ptrdiff_t and change row_idx_end to const auto because it is never modified. As per coding guidelines, do not rely on transitive includes and use const for non-mutated variables.

Source: Coding guidelines

c/parallel/test/test_segmented_reduce.cpp (1)

416-417: ⚡ Quick win

suggestion: add <cstddef> because these lines use std::ptrdiff_t, and make both iterator locals const auto since they are read-only. As per coding guidelines, files must include headers for used symbols and non-mutated locals should be const.

Also applies to: 487-488

Source: Coding guidelines

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: db866020-3a6c-4d48-a5b0-c6db8daef821

📥 Commits

Reviewing files that changed from the base of the PR and between a4ffbcf and bfc3ca8.

📒 Files selected for processing (167)

.clang-tidy
c/experimental/stf/test/test_async_resources_handle.cu
c/experimental/stf/test/test_cuda_kernel.cu
c/experimental/stf/test/test_host_launch.cu
c/experimental/stf/test/test_logical_data_with_place.cu
c/experimental/stf/test/test_stackable.cu
c/experimental/stf/test/test_stream_ctx_override.cu
c/experimental/stf/test/test_task_get_graph.cu
c/parallel/test/test_histogram.cpp
c/parallel/test/test_segmented_reduce.cpp
c/parallel/test/test_segmented_sort.cpp
c/parallel/test/test_util.h
c2h/include/c2h/cpu_timer.h
cub/benchmarks/bench/merge/merge_common.cuh
cub/cub/agent/agent_batch_memcpy.cuh
cub/cub/agent/agent_histogram.cuh
cub/cub/agent/agent_radix_sort_histogram.cuh
cub/cub/agent/agent_radix_sort_onesweep.cuh
cub/cub/agent/agent_radix_sort_upsweep.cuh
cub/cub/agent/agent_reduce_by_key.cuh
cub/cub/agent/agent_rle.cuh
cub/cub/agent/agent_scan.cuh
cub/cub/agent/agent_scan_by_key.cuh
cub/cub/agent/agent_select_if.cuh
cub/cub/agent/agent_three_way_partition.cuh
cub/cub/agent/agent_topk.cuh
cub/cub/agent/agent_unique_by_key.cuh
cub/cub/agent/single_pass_scan_operators.cuh
cub/cub/block/block_radix_rank.cuh
cub/cub/block/specializations/block_reduce_warp_reductions.cuh
cub/cub/detail/warpspeed/allocators/smem_allocator.cuh
cub/cub/detail/warpspeed/look_ahead.cuh
cub/cub/detail/warpspeed/squad/squad.cuh
cub/cub/detail/warpspeed/sync_handler.cuh
cub/cub/device/dispatch/dispatch_scan_by_key.cuh
cub/cub/device/dispatch/kernels/kernel_histogram.cuh
cub/cub/device/dispatch/kernels/kernel_reduce_deterministic.cuh
cub/cub/device/dispatch/kernels/kernel_scan_lookahead.cuh
cub/cub/device/dispatch/kernels/kernel_segmented_reduce.cuh
cub/cub/util_ptx.cuh
cub/cub/warp/specializations/warp_scan_shfl.cuh
cub/examples/block/example_block_radix_sort.cu
cub/examples/block/example_block_reduce.cu
cub/examples/block/example_block_scan.cu
cub/examples/device/example_device_partition_flagged.cu
cub/examples/device/example_device_partition_if.cu
cub/examples/device/example_device_select_flagged.cu
cub/examples/device/example_device_select_if.cu
cub/examples/device/example_device_select_unique.cu
cub/examples/device/example_device_sort_find_non_trivial_runs.cu
cub/examples/device/example_device_topk_pairs.cu
cub/test/catch2_large_array_sort_helper.cuh
cub/test/catch2_large_problem_helper.cuh
cub/test/catch2_segmented_sort_helper.cuh
cub/test/catch2_test_block_load_to_shared.cu
cub/test/catch2_test_block_scan.cu
cub/test/catch2_test_block_shuffle.cu
cub/test/catch2_test_device_reduce_env_api.cu
cub/test/catch2_test_device_scan.cuh
cub/test/catch2_test_device_segmented_radix_sort_pairs.cu
cub/test/catch2_test_device_segmented_scan_api.cu
cub/test/catch2_test_device_segmented_scan_noncommutative.cu
cub/test/catch2_test_device_select_env_api.cu
cub/test/catch2_test_device_topk_common.cuh
cub/test/catch2_test_device_transform.cu
cub/test/catch2_test_device_transform_api.cu
cub/test/catch2_test_launch_wrapper.cu
cub/test/catch2_test_memcpy_vectorized_copy.cu
cub/test/catch2_test_radix_operations.cu
cub/test/catch2_test_warp_exchange.cuh
cub/test/catch2_test_warp_load.cu
cub/test/catch2_test_warp_merge_sort.cu
cub/test/catch2_test_warp_scan.cu
cub/test/catch2_test_warp_scan_api.cu
cub/test/catch2_test_warp_scan_partial.cu
cub/test/catch2_test_warp_scan_partial_helper.cuh
cub/test/catch2_test_warp_scan_partial_invalid.cu
cub/test/catch2_test_warp_store.cu
cub/test/test_allocator.cu
cub/test/test_util.h
cub/test/warp/catch2_test_warp_reduce_batched.cu
cub/test/warp/catch2_test_warp_segmented_reduce.cu
cudax/benchmarks/bench/copy/copy_bench.cu
cudax/examples/simple_p2p.cu
cudax/examples/stdexec_stream.cu
cudax/examples/vector_add.cu
cudax/test/copy/copy.cu
cudax/test/copy_bytes/mdspan_d2h_h2d.cu
cudax/test/copy_bytes/mdspan_d2h_h2d_relaxed.cu
cudax/test/cuco/hyperloglog/test_hyperloglog.cu
libcudacxx/benchmarks/bench/adjacent_find/basic.cu
libcudacxx/benchmarks/bench/all_of/basic.cu
libcudacxx/benchmarks/bench/any_of/basic.cu
libcudacxx/benchmarks/bench/equal/basic.cu
libcudacxx/benchmarks/bench/find/basic.cu
libcudacxx/benchmarks/bench/find_if/basic.cu
libcudacxx/benchmarks/bench/find_if_not/basic.cu
libcudacxx/benchmarks/bench/is_heap/basic.cu
libcudacxx/benchmarks/bench/is_heap_until/basic.cu
libcudacxx/benchmarks/bench/is_partitioned/basic.cu
libcudacxx/benchmarks/bench/is_sorted/basic.cu
libcudacxx/benchmarks/bench/is_sorted_until/basic.cu
libcudacxx/benchmarks/bench/mismatch/basic.cu
libcudacxx/benchmarks/bench/none_of/basic.cu
libcudacxx/benchmarks/bench/rotate/basic.cu
libcudacxx/benchmarks/bench/rotate_copy/basic.cu
libcudacxx/benchmarks/bench/shift_left/basic.cu
libcudacxx/benchmarks/bench/shift_right/basic.cu
libcudacxx/include/cuda/__barrier/barrier_block_scope.h
libcudacxx/test/libcudacxx/cuda/ccclrt/hierarchy/hierarchy_smoke.cu
libcudacxx/test/libcudacxx/cuda/containers/buffer/transform.cu
libcudacxx/test/support/test_pstl.h
nvbench_helper/nvbench_helper/nvbench_helper.cuh
test/cuda_smoke/cuda_runtime_smoke.cu
thrust/benchmarks/bench/all_of/basic.cu
thrust/benchmarks/bench/any_of/basic.cu
thrust/benchmarks/bench/equal/basic.cu
thrust/benchmarks/bench/find/basic.cu
thrust/benchmarks/bench/find_if/basic.cu
thrust/benchmarks/bench/find_if_not/basic.cu
thrust/benchmarks/bench/is_partitioned/basic.cu
thrust/benchmarks/bench/is_sorted/basic.cu
thrust/benchmarks/bench/is_sorted_until/basic.cu
thrust/benchmarks/bench/mismatch/basic.cu
thrust/benchmarks/bench/none_of/basic.cu
thrust/examples/bucket_sort2d.cu
thrust/examples/cuda/wrap_pointer.cu
thrust/examples/discrete_voronoi.cu
thrust/examples/dot_products_with_zip.cu
thrust/examples/monte_carlo.cu
thrust/examples/monte_carlo_disjoint_sequences.cu
thrust/examples/padded_grid_reduction.cu
thrust/examples/sort.cu
thrust/examples/sum_columns.cu
thrust/examples/sum_rows.cu
thrust/examples/transform_iterator.cu
thrust/testing/binary_search_vector.cu
thrust/testing/catch2_test_adjacent_difference.cu
thrust/testing/catch2_test_transform.cu
thrust/testing/cuda/copy_if.cu
thrust/testing/cuda/sort.cu
thrust/testing/cuda/transform.cu
thrust/testing/dereference.cu
thrust/testing/for_each.cu
thrust/testing/gather.cu
thrust/testing/generate.cu
thrust/testing/merge.cu
thrust/testing/merge_by_key.cu
thrust/testing/partition.cu
thrust/testing/remove.cu
thrust/testing/replace.cu
thrust/testing/reverse.cu
thrust/testing/scan.cu
thrust/testing/shuffle.cu
thrust/testing/tabulate.cu
thrust/testing/transform_scan.cu
thrust/testing/unique_by_key.cu
thrust/testing/zip_function.cu
thrust/thrust/detail/complex/catrig.h
thrust/thrust/detail/complex/catrigf.h
thrust/thrust/detail/complex/cexp.h
thrust/thrust/detail/complex/cexpf.h
thrust/thrust/detail/complex/math_private.h
thrust/thrust/system/cuda/detail/core/util.h
thrust/thrust/system/cuda/detail/reduce.h
thrust/thrust/system/cuda/detail/reduce_by_key.h
thrust/thrust/system/cuda/detail/set_operations.h

💤 Files with no reviewable changes (1)

.clang-tidy

coderabbitai · 2026-06-17T17:48:57Z

 {
  auto h_pairs = thrust::make_zip_iterator(h_res_keys.begin(), h_res_values.begin());
-  thrust::sort(h_pairs, h_pairs + h_res_keys.size());
+  thrust::sort(h_pairs, h_pairs + static_cast<std::ptrdiff_t>(h_res_keys.size()));


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

important: std::ptrdiff_t is introduced on Line 74 without adding <cstddef>. Add an explicit #include <cstddef> to keep this TU self-sufficient and avoid transitive-include dependency.

As per coding guidelines, “Files must include all headers related to the symbols that they are using and relying on transitive header inclusion is not allowed.”

Source: Coding guidelines

coderabbitai · 2026-06-17T17:48:57Z

    for (int j = 0; j < N; ++j)
    {
-      host_data[i * N * 2 + j] = i * N + j;
+      host_data[static_cast<std::size_t>(i) * N * 2 + j] = static_cast<int>(static_cast<std::size_t>(i) * N + j);


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

important: Line 369 hardcodes row stride as N * 2 instead of using Ld, which makes the test silently wrong if Ld is changed independently. Use static_cast<std::size_t>(i) * Ld + j to match the declared stride contract.

coderabbitai · 2026-06-17T17:48:57Z

+    for (int i = static_cast<int>(cuda::gpu_thread.rank(cuda::grid)); i < a.size();
+         i += static_cast<int>(cuda::gpu_thread.count(cuda::grid)))


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

suggestion: use cuda::std::span<T>::size_type (or auto) for the grid-stride loop index/stride instead of int. On Line 73 and Line 74, int can overflow for large buffers while a.size() is wider, which can break termination/correct indexing in stress cases.

coderabbitai · 2026-06-17T17:48:57Z

 __global__ void increment_kernel(int* p, int n)
 {
-  int idx = blockIdx.x * blockDim.x + threadIdx.x;
+  int idx = static_cast<int>(blockIdx.x * blockDim.x + threadIdx.x);


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

important: Line 18 should declare idx as const because it is never reassigned.
As per coding guidelines, “All non-const variables must use const.”

Source: Coding guidelines

github-actions · 2026-06-17T23:34:44Z

😬 CI Workflow Results

🟥 Finished in 5h 52m: Pass: 99%/503 | Total: 22d 17h | Max: 2h 52m | Hits: 26%/3096939

See results here.

miscco

yeah this is just noise

Jacobfaib · 2026-06-22T12:31:00Z

yeah this is just noise

I disagree. This checks makes developers carefully consider the size type and does illustrate some possible real bugs. There's a lot of places where we take for granted that something fits in an int when we could/should instead use unsigned values to either be safe or to telegraph the true range of the value.

Granted the way I've silenced a lot of these is noise (by just static_cast-ing them), but this is because the real fix of changing sizing types in CUB needs careful consideration.

fbusato · 2026-06-22T16:33:44Z

I don't have a strong preference. The warning is useful. This will force developers to think about potential data loss. On the other hand, it is very verbose. The thread hierarchy API will help a lot for device code (instead of blockIdx.x * ...).
Also, is the warning overlapped with -Wnarrowing?

miscco · 2026-06-22T17:46:18Z

  const auto common_prefix  = state.get_float64("MismatchAt");
-  const auto mismatch_point = cuda::std::clamp<std::size_t>(elements * common_prefix, 0, elements - 2);
+  const auto mismatch_point = cuda::std::clamp<std::size_t>(
+    static_cast<std::size_t>(static_cast<double>(elements) * common_prefix), std::size_t{0}, elements - 2);


We should just ensure that common_prefix is size_t

bugprone-narrowing-conversions

bfc3ca8

Jacobfaib self-assigned this Jun 17, 2026

Jacobfaib requested review from a team as code owners June 17, 2026 17:36

github-project-automation Bot added this to CCCL Jun 17, 2026

Jacobfaib requested review from alliepiper and gonidelis June 17, 2026 17:36

github-project-automation Bot moved this to Todo in CCCL Jun 17, 2026

Jacobfaib requested review from caugonnet, elstehle and wmaxey June 17, 2026 17:36

cccl-authenticator-app Bot moved this from Todo to In Review in CCCL Jun 17, 2026

This comment was marked as off-topic.

Sign in to view

coderabbitai Bot reviewed Jun 17, 2026

View reviewed changes

miscco requested changes Jun 22, 2026

View reviewed changes

github-project-automation Bot moved this from In Review to In Progress in CCCL Jun 22, 2026

miscco reviewed Jun 22, 2026

View reviewed changes

miscco approved these changes Jun 22, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bugprone-narrowing-conversions#9505

bugprone-narrowing-conversions#9505
Jacobfaib wants to merge 1 commit into
NVIDIA:mainfrom
Jacobfaib:jacobf/2026-06-17/bugprone-narrowing-conversions

Jacobfaib commented Jun 17, 2026

Uh oh!

This comment was marked as off-topic.

coderabbitai Bot left a comment

Uh oh!

Uh oh!

coderabbitai Bot Jun 17, 2026

Uh oh!

coderabbitai Bot Jun 17, 2026

Uh oh!

coderabbitai Bot Jun 17, 2026

Uh oh!

coderabbitai Bot Jun 17, 2026

Uh oh!

github-actions Bot commented Jun 17, 2026

Uh oh!

miscco left a comment

Uh oh!

Jacobfaib commented Jun 22, 2026

Uh oh!

fbusato commented Jun 22, 2026

Uh oh!

miscco Jun 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		for (int i = static_cast<int>(cuda::gpu_thread.rank(cuda::grid)); i < a.size();
		i += static_cast<int>(cuda::gpu_thread.count(cuda::grid)))

Conversation

Jacobfaib commented Jun 17, 2026

Description

Checklist

Uh oh!

This comment was marked as off-topic.

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Jun 17, 2026

😬 CI Workflow Results

🟥 Finished in 5h 52m: Pass: 99%/503 | Total: 22d 17h | Max: 2h 52m | Hits: 26%/3096939

Uh oh!

miscco left a comment

Choose a reason for hiding this comment

Uh oh!

Jacobfaib commented Jun 22, 2026

Uh oh!

fbusato commented Jun 22, 2026

Uh oh!

miscco Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants